NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Novel Uncertainty Quantification through Perturbation-Assisted Sample Synthesis

https://doi.org/10.1109/TPAMI.2024.3393364

Liu, Yifei; Shen, Rex; Shen, Xiaotong (January 2024, IEEE Transactions on Pattern Analysis and Machine Intelligence)
Lee, Kyoung Mu (Ed.)
This paper introduces a novel Perturbation-Assisted Inference (PAI) framework utilizing synthetic data generated by the Perturbation-Assisted Sample Synthesis (PASS) method. The framework focuses on uncertainty quantification in complex data scenarios, particularly involving unstructured data while utilizing deep learning models. On one hand, PASS employs a generative model to create synthetic data that closely mirrors raw data while preserving its rank properties through data perturbation, thereby enhancing data diversity and bolstering privacy. By incorporating knowledge transfer from large pretrained generative models, PASS enhances estimation accuracy, yielding refined distributional estimates of various statistics via Monte Carlo experiments. On the other hand, PAI boasts its statistically guaranteed validity. In pivotal inference, it enables precise conclusions even without prior knowledge of the pivotal’s distribution. In non-pivotal situations, we enhance the reliability of synthetic data generation by training it with an independent holdout sample. We demonstrate the effectiveness of PAI in advancing uncertainty quantification in complex, data-driven tasks by applying it to diverse areas such as image synthesis, sentiment word analysis, multimodal inference, and the construction of prediction intervals.
more » « less
Full Text Available
Boosting Summarization with Normalizing Flows and Aggressive Training

https://doi.org/10.18653/v1/2023.emnlp-main.165

Yang, Yu; Shen, Xiaotong (August 2023, The Proceeding of the 2023 Conference on Empirical Methods in Natural Language Processing)
Bouamor, Houda; Pino, Juan; Bali, Kalia (Ed.)
This paper presents FlowSUM, a normalizing flows-based variational encoder-decoder framework for Transformer-based summarization. Our approach tackles two primary challenges in variational summarization: insufficient semantic information in latent representations and posterior collapse during training. To address these challenges, we employ normalizing flows to enable flexible latent posterior modeling, and we propose a controlled alternate aggressive training (CAAT) strategy with an improved gate mechanism. Experimental results show that FlowSUM significantly enhances the quality of generated summaries and unleashes the potential for knowledge distillation with minimal impact on inference time. Furthermore, we investigate the issue of posterior collapse in normalizing flows and analyze how the summary quality is affected by the training strategy, gate initialization, and the type and number of normalizing flows used, offering valuable insights for future research.
more » « less
Full Text Available
A hierarchical ensemble causal structure learning approach for wafer manufacturing

https://doi.org/10.1007/s10845-023-02188-z

Yang, Yu; Bom, Sthitie; Shen, Xiaotong (October 2023, Journal of Intelligent Manufacturing)

In manufacturing, causal relations between components have become crucial to automate assembly lines. Identifying these relations permits error tracing and correction in the absence of domain experts, in addition to advancing our knowledge about the operating characteristics of a complex system. This paper is motivated by a case study focusing on deciphering the causal structure of a wafer manufacturing system using data from sensors and abnormality monitors deployed within the assembly line. In response to the distinctive characteristics of the wafer manufacturing data, such as multimodality, high-dimensionality, imbalanced classes, and irregular missing patterns, we propose a hierarchical ensemble approach. This method leverages the temporal and domain constraints inherent in the assembly line and provides a measure of uncertainty in causal discovery. We extensively examine its operating characteristics via simulations and validate its effectiveness through simulation experiments and a practical application involving data obtained from Seagate Technology. Domain engineers have cross-validated the learned structures and corroborated the identified causal relationships.
more » « less
Full Text Available
Nonlinear Causal Discovery with Confounders

https://doi.org/10.1080/01621459.2023.2179490

Li, Chunlin; Shen, Xiaotong; Pan, Wei (October 2023, Journal of the American Statistical Association)

This article introduces a causal discovery method to learn nonlinear relationships in a directed acyclic graph with correlatedGaussian errors due to confounding. First,we derive model identifiability under the sublinear growth assumption. Then, we propose a novel method, named the Deconfounded Functional Structure Estimation (DeFuSE), consisting of a deconfounding adjustment to remove the confounding effects and a sequential procedure to estimate the causal order of variables. We implement DeFuSE via feedforward neural networks for scalable computation. Moreover, we establish the consistency of DeFuSE under an assumption called the strong causal minimality. In simulations, DeFuSE compares favorably against state of-the-art competitors that ignore confounding or nonlinearity. Finally, we demonstrate the utility and effectiveness of the proposed approach with an application to gene regulatory network analysis. The Python implementation is available at https://github.com/chunlinli/defuse. Supplementary materials for this article are available online.
more » « less
Full Text Available
Distribution-invariant differential privacy

https://doi.org/10.1016/j.jeconom.2022.05.004

Bi, Xuan; Shen, Xiaotong (August 2023, Journal of Econometrics)

Full Text Available
Discovery and inference of a causal network with hidden confounding*

https://doi.org/10.1080/01621459.2023.2261658

Chen, Li; Li, Chunlin; Shen, Xiaotong; Pan, Wei (October 2023, Journal of the American Statistical Association)

This article proposes a novel causal discovery and inference method called GrIVET for a Gaussian directed acyclic graph with unmeasured confounders. GrIVET consists of an order-based causal discovery method and a likelihood-based inferential procedure. For causal discovery, we generalize the existing peeling algorithm to estimate the ancestral relations and candidate instruments in the presence of hidden confounders. Based on this, we propose a new procedure for instrumental variable estimation of each direct effect by separating it from any mediation effects. For inference, we develop a new likelihood ratio test of multiple causal effects that is able to account for the unmeasured confounders. Theoretically, we prove that the proposed method has desirable guarantees, including robustness to invalid instruments and uncertain interventions, estimation consistency, low-order polynomial time complexity, and validity of asymptotic inference. Numerically, GrIVET performs well and compares favorably against state-of-the-art competitors. Furthermore, we demonstrate the utility and effectiveness of the proposed method through an application inferring regulatory pathways from Alzheimer’s disease gene expression data.
more » « less
Full Text Available
Data-adaptive discriminative feature localization with statistically guaranteed interpretation

https://doi.org/10.1214/22-AOAS1705

Dai, Ben; Shen, Xiaotong; Chen, Lin Yee; Li, Chunlin; Pan, Wei (September 2023, The Annals of Applied Statistics)

Full Text Available
Significance Tests of Feature Relevance for a Black-Box Learner

https://doi.org/10.1109/TNNLS.2022.3185742

Dai, Ben; Shen, Xiaotong; Pan, Wei (February 2024, IEEE Transactions on Neural Networks and Learning Systems)
Embedding Learning

https://doi.org/10.1080/01621459.2020.1775614

Dai, Ben; Shen, Xiaotong; Wang, Junhui (January 2022, Journal of the American Statistical Association)

Full Text Available
Data Flush

https://doi.org/10.1162/99608f92.681fe3bd

Shen, Xiaotong; Bi, Xuan; Shen, Rex (January 2022, Harvard Data Science Review)

Data perturbation is a technique for generating synthetic data by adding ‘noise’ to raw data, which has an array of applications in science and engineering, primarily in data security and privacy. One challenge for data perturbation is that it usually produces synthetic data resulting in information loss at the expense of privacy protection. The information loss, in turn, renders the accuracy loss for any statistical or machine learning method based on the synthetic data, weakening downstream analysis and deteriorating in machine learning. In this article, we introduce and advocate a fundamental principle of data perturbation, which requires the preservation of the distribution of raw data. To achieve this, we propose a new scheme, named data flush, which ascertains the validity of the downstream analysis and maintains the predictive accuracy of a learning task. It perturbs data nonlinearly while accommodating the requirement of strict privacy protection, for instance, differential privacy. We highlight multiple facets of data flush through examples.
more » « less
Full Text Available

« Prev Next »

Search for: All records